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ABSTRACT 



Pearson's unrestricted chi-sguare procedure is 
reviewed, and an historical presentation of Neyaan's restricted 
chi-square test is introduced with a discussion of its theory and 
applicability to education. An example of the Neyman procedure is 
discussed in detail to tamiliarize researchers with this useful 
technique for analyzing contingency tables. The analysis also 
displays the need for researchers to check model assumptions and 
power la order to produce constructive analysis. This presentation of 
a statistical procedure developed by mathomatical statisticians 
allows researchers in the behavioral sciences a facility with the 
method for application in their particular research. (Author/PR) 
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iieyman's Restricted Chi-Sauare Tests 



'leil H. Timin 
Uni varsity of Pittsburah 

Abstract 

A historical presentation of ilcvnan's "eitricted chi-Sciuare tests Is 
introduced with a discussion of its theory and applicability to oducation 
included. This presentation o^ a statistical procedure developed by natf.- 
ematical statisticians allo'fs researchers in the behavioral sciences a fa- 
cility '.'.'ith the nethod for application in their particular research. 

Introduction 

Karl Pearson's (1900) chi-square test criterion for continpenev tables 
has been employed in many areas of educational research wherever mutually 
exclusive and exhaustive qualitative events occur, t'eyman's (1949) classi- 
cal paper, which has remained unnoticed by educational methodologists along 
with the Fix, Uodqes and Lehmann (1959) article, extends the analysis of 
categorized data by restricting the class of admissible hypotheses. Before 
considering the consequences of i.'eyman's modification of the chi-square 
test, a review of Pearson's (unrestricted) chi-squaro procedure will be made. 

The Unrestricted Chi-Square Test 

For illustration, consider a two-way rxc contingency table. Suppose 
n observations can be classified according to two characteristics, A and B, 

with A^, Ag, ..., arid Bg B^ mutually exclusive and exhaustive 

categories. The observed frequency in the coll is denoted by n^j and 
O tJie probability represented by P,,. Also define 
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Since each of the n observations hav(? to be classified into one of the rc 
cells, it follCT7s that 



z sn-.^En. =En.=n 
i 1. j -T 



1 .] 

E E p.. = E p. = E p . = 1 
i j ^ i 1. j .J 

The class of admissible hypotheses fi is represented by 

n = { . > 0 with E E p. . = 1 

ij i j 

1 - 1 , r, j“l, c } 

where parameters arc p^^, p,^, .... and the sample size is n^^, 



i, n 



rc’ 



Although there are many hypotheses v'hich may be tested under the above 
model, only the test of independence is considered. To test the hypothesis 
H of independence 

H: p.^ = P^_P j 1=1, ..., r, j=l c 

that the parameters belong to w, a subset of P, vjhere 
u) = { p^.j such that p|j e 17 and ^ “ P.. b j 

i”l, r, J-1, •••, c } 

it is necessary to find estimates of that minimize 

(0-E)^ 



«= E E (n -np .)Vnp^, » j 
^ 1 j i,J E 



under w such that E E p,. =■ 1. 

1 i 

2 

In general, the distribution of depends on the estimation proce- 
dure and on the number of unknown parameters. R.A. Fisher (1924) i/as the 

O 
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first CO find the limiting distribution of for an important class of 

methods of estimation under rather nenoral conditions. Fisher illustrated 

that as n tends to » and the nunter of cells remain fixed, the distrihu- 
2 2 

tion of tended to a x distribution on f(H) degrees of freedom; 

f(H) being the number of cells minus the number of independent parameters 
estimatecl minus 1. Cramer (1946, page 4?4) and iloyman (1949) extended 
Fisher's results, fleyman shoc/ed his outcome true for any best asympototi- 
cally normal, BAti, estimates which include maximum likelihood, minimum chi- 
square, minimum modified chi-souare and Neyman's minimum "linearized" chi- 
square estimates. 

The distribution of the random variables iin , M 12 -'yj. under the 

above model is by definition, nultinomial , 



PI (il 



n 



v/i th E E n. j 
i j 

P,J is ?1J • 



•"11> ('Vs'"rs> I = I ■ "I ,","'>1/''’ ' ; 5 "tj- 

and 7 E Pj. = 1. Under H, the maximum likelihood estimate of 
1 j 

2 2 
n_. n ./n so that the observed value of X., Is 

1 • • J 4 * 
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I."./"' 



If n is large and not too many of the P^^’s are small (all np^j's ^3), then 

2 *”2 

is distributed approximately under H as a x f(H) » rc-(r+c-2) 

-1 = (r-l)(c-l) degrees of freedom. This procedure nay be used with caution 
even if some of the expected cell frequencies are less than 1 provided n 1s 
large and rc is moderate (greater than 6). 

One often finds that researchers regroup their data to remove low ex- 
pected values. This procedure effects the oov<er of the chi-square test and 
if estimates of the parameters are based on the ungrouped data the llmitino 
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distribution of X is not chi-squarfd (as it Is when the estimates are 
H 

based on the grouped data) according to Chernoff and Lehmann (19i;4). 

The unrestricted chi-square test may be used to test nearly any hypo- 
thesis in v;hich observations are grouped into distinct cells (a v/eak re- 
striction) for which BAfI estimates of the expected cell frequencies can be 
found. The test is easy to apply, consistent against all alternatives to 
the hypothesis tested and sensitive in all directions. However, in any 
particular problem it may he less desirable than a test designed to test 
particular alternatives. 

Restricted Chi-Square Tests 

Under the class fi of admissible hypotheses, Heyman imposes restric- 
tions on the cell probabilities p^ , P^. Given these restrictions, 

the hypothesis H further constrains the relations among the P^'s. For 
example, the probabilities under the general model 0 may depend in a par- 
ticular manner on some unknown parameters O. » (e-j , 02 » 6^) so that 

we may write under n. The hypothesis H can, for example, specify 

that Sg = 0. Under this restriction, the unrestricted chi-square test 
would seem to be undesirable since this statistic does not consider what 
happens under n. Intuitively, the chi-square criterion should test the 
hypothesis H against fi - H and not against the most general possible al- 
ternative in n. Neyman's restricted chi-square test does exactly this: 
the restricted chi-square criterion is the difference 




This difference measures the increase in chi-square imposed by the addi- 
tional restrictions in H, over those in n. 
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fiore formal ly, suppose n observations X|, Xg, .... are classified 
into m mutually exclusive and exhaustive cells, and that the number of ob- 
servations recorded in the cell is Xj^. Under fJ assume that the proba- 
bility of any X occurring in the cell is P|^(l) " 
volving s < k unknown parameters and that under H assume that the probabi- 
lity is fr|^( 6 .) = 65 ^ '■■/here K p and E = 1 ^nd p|^(e)>c ^>0 

for all k. Further, let ^ be a 8 Aii estimate of ^ under H and ^ under n. 
Define the restricted chi-square criterion as 



i=l 



i=l 



n ir.(9) 
1 



np^(e) 



wi th 



f(R) = f(H) - f(fi) 

degrees of freedom. f(fi) is the total number of independent cells minus 

the number of i ndependent parameters estimated from the data under fi; and 

f(H) is t!te same under the hypothesis H. 

2 

It should bo observed that X|| is the unrestricted chi-square descri- 
bed by Pearson and thus has the same number of degrees of freedom as before. 

2 

Mevman (1949) shov/s that the restricted chi-square criterion Xp 
(given n) has, asymptotically as n ® and n remains fixed, a chi-square 
distribution on f(R) degrees of freedom. He also shoiis the asympotic equi- 
valence of his criterion 'jith the Hilks x-criterion, a result which is In 

agreement with the unrestricted case. 

2 

The test statistic may be employed to test n against more general 

hypotheses with only trivial restrictions (such as E p =1). Thus, 

k k ^ 

lay be used to test the model. Upon rejection of the model, one can 

6 
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either relax the conditions under fi to some 'vider class u* or abandon n al- 
tonether and use the most general class of admissible hypotheses. In this 
case, X.^, vjould become the test criterion rather than X^. 

In utilizing the restricted chi~sguare test in practice, the model 

must be tested. The procedure allo'/s one to compute 
2 

X|| './hich tests H against all admissible hypotheses 
2 

X vihich tests p against all admissible hypotheses 
2 

X^ which tests fi against n (admissible) hyootheses 
Asymptotic Poijer Calculations 

A general idea of tlie pa-/er of tiie test under consideration is re- 
quired in educational research. Casic to the approximation of power in con- 
tingency tables is the noncentral chi-square distribution vith non-centrality 
parameter and one or more alternative hypotheses k e K. '!ald (1943) has 
shown how one may estimate the non-cc-ntrality parameter for large samples. 

Let H be a particular hypothesis and T(X^, .... X^) under an alternative 
k e Ki just as under H, the central chi-square distribution is supposed to 
approximate the discrete distribution of T(X-|, .... X^). Further, the ap- 
proximate non-centrality paranotor under an alternative k e K is tlie 
value of the statistic T(X^, .... with the expected value of under 

K substituted everyvihere for X^. Hence Pj^-, = T(Ej^X.j, E|^X^). 

Tables prepared by Fix (1949) and Fix, Hodges and Lehmann (1959) are 
available to alla-< easy pcjor calculations by entering with (>|^{their x) 
and degrees of freedom f. 

For ileynan's restricted chi-square test, the same rule applies; he;- 

I 



ERIC 



iver, and is required. 



7 



Example 

An educational researcher elects to study the relationshio bet;een 
students havinp to satisfy the statistics methodology requirement and the 
advent of those students being placed on scholastic probation. A (hypo- 
thetical) random sanplo of students who enrolled in the College of Educa- 
tion has been collected. 

Let j denote the number of quarters of statistics completed (j=0, 1, 

2, 3, 4 or more) and i denote 'vhethor a student has ever been on probation 

(i=l, yes;i=2, no). Rather than merely testing ^ihether going on probation 

is independent of the number of quarters of statistics completed, the ques 

tion might be v.'hether the probability of beine on probation decreases as 

the number of quarters of stat,r,tics completed increases. 

Under n, the model becomes 

p: Hultinomial (p. n) vvhere 

'ij 

n = }: n. , ; E £ n,, = 1 
i.i ’’ i i 

and 

P, . = (a + B (j-2))p . j=0, 1, 2, 3, 4 

'Ij 

or since + Pgj * 

Pgj = (1 - a - B (J-2))p . 




Some type of relationship vjould be expected to exist among the as 

j Increases, the probability of running into academic difficulty would be 
less likely. That the model is linear v;il1 have to bo checked. The para- 
meter B measures the increased difficulty due to the statistics require- 
ment, and a is a sort of average . 

s 



Under the hypothesis II, that no linear relationshin exists, it is 



O 



desirable to test 

H: e = n 

K: MO 

The data for this study folio'-; 

,i = number of quarters 

0 1 ? 3 Mr 

nore 



i = on 


probation 


yes 


le 




3 


2 


20 


50 


at 


any tine 


no 


11 


17 


7 


4 


115 


154 



27 2G 10 6 135 204 



To anply the I'eynan restricted chi-spiiare test to this data, estimates 
of a and 0 under n are necessary. Fron this model 
p^. = {a + 0 (f-2))n ^ 
or 



^.i 



= u + 0 (,i-2) 



Since a multinomial distribution exists, the maxinum likelihood estimate of 
Plj 



^,i 






Lettinn 



R(O|0) “ I ((n, /II .) - a-0(J-2))^n .. 

,i=0 1.1-1 -I 

Takinn nartial derivatives v;ith resnoct to a and 0.and equatino to zero, 
the follo"ino equations result 
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S n,,(j-2) = aE n .(.i-2) + 0E n .(,i-2)^. 

j 'J j j 



Employfnci the data. 



= 20^a + 

T = 156a + erOs 

solving this system yields the estimates a-j = .33702, = -.09567. 

A 

Estinintim the parameters under H: 6 = 0, only Oj, needs to be found. 



Ilininizi np 



.2 



G(a) = I - a)' n 

vfith respect to a yields 



. "1. 



50 



Thus, under n 



and under If 



“H ■ 






n 204 



= .24510 



p^. = (.33702 - .09567 (.1-2)) r> ^ 
n . = (.245'0) p j 

* J A 



( 1 ) 

( 2 ) 



\ ^ A 

vfhere P q “ .1323, p ^ = .1275, P ^ ” *0490, P - - .0294, and o ^ = .6618. 

.3 2 

In terns cf equations (1) and (2), the expression for becomes 



X* = - x2 

*R ^1 (1 

1.,i 






np 



i.i 



— 

A 

np 



fj 



The expected tables under H and fi are rospectivol.v: 

.1 B nunber of quarters 
H; 
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6.615 


6.375 


2.450 


1.470 


33.090 


50 


20.385 


19.625 


7.559 


4.530 


101.910~ 


154 


27 


26 


10 


6 


135 


204 
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i 

so that 



j = number of quarters 



14.261 


11.254 


3.369 


1.443 


19.663 


50 


12.739 


14.746 


6.631 


4.552 


115.332 


154 


27 


26 


10 


6 


135 


204 



= 26.343 where f(H) = 4 
= 1.590 where f(o) = 3 
Xp - 24.753 v;here f(f) = 1 

2 

The statistic is enplovcd to test the rodel assumptions. The de- 

qrces of freedom f{o) is obtained by subtract! no the number of independent 
parameters estimated (6) from the number of independent cells (9). Since 



2 2 

Xjj < X 3 -(. 93 )= 7.81, the model assinntions are tenable. 

Given that the model assumptions are satisfied, tcstinn the hypothesis 

“> ?. 

M: B = 0 anainst the alternative K: B ^ 0 nay proceed. Since (.95) 



3.84 the hypothesis of indopenc'ence is rc.iocted (the sane conclusion as 
reached by the unrestricted tost). 

I'ore information has hnen nathored by use of the restricted chi-snuare 
procedure because of a prediction equation irivolvino the cell probabilities 
has been obtained 

= .33702 - .09567 (i-2) 

p,, * .33702 - .09567 (i-2) (n ./n) for 1 , 2. 3, 4. 
r j 

This typo of relationship could not have been procured by an unrestricted 
chi-square procedure. At most, a phi coefficient niaht have been calcu- 
O lated. 
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Having a regression equation for the orobabllities , association of 
a(i-o) confidence interval for the coefficient e is sought. Employing 
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the formula, 

3 - 0^. /x^(.95) < 3 < e + d' / 

the follov/ing approximate T - a confidence interval for 3 is created. 

- .11 £ S £ .08. 

As expected, zero is not included in the interval, anain verifying that 3 
is significantly different from zero. 

Power Calculations 

The power of a restricted chi-square test is always greater than an 
unrestricted chi-square test given that the alternative of Interest reason 
ably satisfies the model restrictions. Deciding on whether to use restric 
ted or unrestricted chi-square tests reduces to a choice betv;oon excellent 
power for a limited class of alternative or weak pa^er for every other 
(Fix, Hodges, Lehmann, 1959). 

To compute power in the example considered, all values of o^j for 
p^j e k must be specified and <)ip by the Wald (1954) nrocoJure needs to be 
evaluated. 

- E (E(M^j|k) - E(N^jlklH))^/E(H^j|klH) 

- E (E(N^j|k) - E'(N,jik|n))^/E(N^j|kln) 

^ f j 

■ ■*H ■ *0 

But since k e n, it is observed that 4 *= 0 so that *■ ♦ with degrees 

n R M 

of freedom equal to f(R). By contrast, the power of an unrestricted chi- 
Q square test is obtained by use of with degrees of freedom f(H). 
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The above relationship indicates the necessity of always testinq 
the trodeT, when the rode! assumptions are not satisfied the unrestricted 
chi-square procedure should be crployed since oower will he lamer. 

Conclusion 

The purpose of the preccedino example and discussion of the Moyman 
procedure was to familiarize researchers with a pse^^ul technique for 
analyzino continnency tables. However, the analysis also displays the 
need for researchers to check node! assumptions and nov.-er in order to 
produce constructive analysis. 
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